Two robots of the same type#
The code for this example is implemented same_robots. Let us import it.
[1]:
from enki_env.examples import same_robots
Environment#
The environment contains just two Thymio robots. The initial position of the robots is fixed while their orientation is sampled uniformly. The robots goal is to rotate so to face each other.
To create the environment via script, run:
python -m enki_env.examples.same_robot.environment
[2]:
env = same_robots.make_env(render_mode="human")
env.reset()
env.snapshot()
[2]:
The robots belong to the same "thymio" group and share the same configuration.
[3]:
env.group_map
[3]:
{'thymio': ['thymio_0', 'thymio_1']}
Like in the single robot example the robots, use just their proximity sensors and receive a similar reward that make them want to rotate until they face each other, when the episode terminates.
[4]:
env.action_spaces
[4]:
{'thymio_0': Box(-1.0, 1.0, (1,), float64),
'thymio_1': Box(-1.0, 1.0, (1,), float64)}
[5]:
env.observation_spaces
[5]:
{'thymio_0': Dict('prox/value': Box(0.0, 1.0, (7,), float64)),
'thymio_1': Dict('prox/value': Box(0.0, 1.0, (7,), float64))}
Baseline#
We have hand-coded a simple distributed policy to achieve the task.
To evaluate the baseline via script, run:
python -m enki_env.examples.same_robots.baseline
[6]:
import inspect
print(inspect.getsource(same_robots.Baseline.predict))
def predict(self,
observation: Observation,
state: State | None = None,
episode_start: EpisodeStart | None = None,
deterministic: bool = False) -> tuple[Action, State | None]:
prox = observation['prox/value']
if any(prox > 0):
prox = prox / np.max(prox)
ws = np.array((0.5, 0.25, 0, -0.25, -0.5, 1, 1))
w = np.dot(ws, prox)
else:
w = 1
return np.clip([w], -1, 1), None
To perform a rollout, we need to assign the policy to the whole group
[7]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': same_robots.Baseline()})
For multi-robot environments, the rollouts return a dictionary with data collected from each group,
[8]:
rollout.keys()
[8]:
dict_keys(['thymio'])
[9]:
rollout['thymio'].episode_reward
[9]:
np.float64(-23.985014107779367)
Reinforcement Learning#
Let us now train and evaluate a RL policy for the same task.
To perform this via script, run: ```console python -m enki_env.examples.same_robots.rl
[10]:
policy = same_robots.get_policy()
[11]:
rollout = env.unwrapped.rollout(max_steps=10, policies={'thymio': policy})
rollout['thymio'].episode_reward
[11]:
np.float64(-16.99562117827852)
Video#
To generate similar video as in the single robot example, run
python -m enki_env.examples.same_robots.video
or
[13]:
video = same_robots.make_video()
video.display_in_notebook(fps=30, width=640, rd_kwargs=dict(logger=None))
MoviePy - Building video __temp__.mp4.
MoviePy - Writing video __temp__.mp4
MoviePy - Done !
MoviePy - video ready __temp__.mp4
[13]:
[ ]: